Efficient lattice representation and generation

نویسندگان

  • Fuliang Weng
  • Andreas Stolcke
  • Ananth Sankar
چکیده

In large-vocabulary, multi-pass speech recognition systems, it is desirable to generate word lattices incorporating a large number of hypotheses while keeping the lattice sizes small. We describe two new techniques for reducing word lattice sizes without eliminating hypotheses. The first technique is an algorithm to reduce the size of non-deterministic bigram word lattices. The algorithm iteratively combines lattice nodes and transitions if local properties show that this does not change the set of allowed hypotheses. On bigram word lattices generated from Hub4 Broadcast News speech, it reduces lattice sizes by half on average. It was also found to produce smaller lattices than the standard finite state automaton determinization and minimization algorithms. The second technique is an improved algorithm for expanding lattices with trigram language models. Instead of giving all nodes a unique trigram context, this algorithm only creates unique contexts for trigrams that are explicitly represented in the model. Backed-off trigram probabilities are encoded without node duplication by factoring the probabilities into bigram probabilities and backoff weights. Experiments on Broadcast News show that this method reduces trigram lattice sizes by a factor of 6, and reduces expansion time by more than a factor of 10. Compared to conventionally expanded lattices, recognition with the compactly expanded lattices was also found to be 40% faster, without affecting recognition accuracy.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

representation theorems of $L-$subsets and $L-$families on complete residuated lattice

In this paper, our purpose is twofold. Firstly, the tensor andresiduum operations on $L-$nested systems are introduced under thecondition of complete residuated lattice. Then we show that$L-$nested systems form a complete residuated lattice, which isprecisely the classical isomorphic object of complete residuatedpower set lattice. Thus the new representation theorem of$L-$subsets on complete re...

متن کامل

Square Lattice Elliptical- Core Photonic Crystal Fiber Soliton-Effect Compressor at 1550nm

 In this paper, we investigate the evolution of supercontinuum and femtosecond optical pulses generation through square lattice elliptical-core photonic crystal fiber (PCF) at 1550 nm by using both full-vector multipole method (M.P.M) and novel concrete algorithms: symmetric  split-step Fourier (SSF) and  fourth order Runge Kutta (RK4) which is an accurate method to solve the general  nonlinear...

متن کامل

COMPUTATIONAL ENUMERATION OF POINT DEFECT CLUSTERS IN DOUBLE- LATTICE CRYSTALS

The cluster representation matrices have already been successfully used to enumerate close-packed vacancy clusters in all single-lattice crystals [I, 2]. Point defect clusters in double-lattice crystals may have identical geometry but are distinct due to unique atomic postions enclosing them. The method of representation matrices is extended to make it applicable to represent and enumerate ...

متن کامل

Representation of a digitized surface by triangular faces

This paper proposes a triangulation method for a digitized surface whose points are located on a regular lattice. The method relies on an iterative and adaptive splitting of triangular faces of an initial polyhedral surface. Assuming a bijection between the digitized surface and its approximation, a partition of the data base is performed. The method allows the measurement of the local quality ...

متن کامل

Representation of a digitized surface by triangular faces

This paper proposes a triangulation method for a digitized surface whose points are located on a regular lattice. The method relies on an iterative and adaptive splitting of triangular faces of an initial polyhedral surface. Assuming a bijection between the digitized surface and its approximation, a partition of the data base is performed. The method allows the measurement of the local quality ...

متن کامل

An Efficient Genetic Agorithm for Solving the Multi-Mode Resource-Constrained Project Scheduling Problem Based on Random Key Representation

In this paper, a new genetic algorithm (GA) is presented for solving the multi-mode resource-constrained project scheduling problem (MRCPSP) with minimization of project makespan as the objective subject to resource and precedence constraints. A random key and the related mode list (ML) representation scheme are used as encoding schemes and the multi-mode serial schedule generation scheme (MSSG...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998